課程大綱

課程資訊

課程名稱

資料科學與社會研究
Data Science and Social Inquiry

開課學期

111-1

授課對象

社會科學院經濟學研究所

授課教師

林明仁

課號

ECON5166

課程識別碼

323 U1250

班次

學分

3.0

全/半年

半年

必/選修

選修

上課時間

星期一2,3,4(9:10~12:10)

上課地點

社科502

備註

「資料科學與社會分析學士班跨域專長」必修課。與陳由常合授
限學士班三年級以上或限碩士班以上或限博士班
總人數上限：60人

課程簡介影片

核心能力關聯

核心能力與課程規劃關聯圖

課程大綱

為確保您我的權利,請尊重智慧財產權及不得非法影印

課程概述

Please check

https://docs.google.com/document/d/1Va_CnqUgMtGCAO6hRUENvu4F7j2KTsWAXSBxKP0OZ0M/edit?usp=sharing

for detail information. Below is a problem set that helps you decide whether you are ready for this course (draft version, don't write yet)

https://drive.google.com/file/d/1fWoYhHQmbVyyyupOJQp73sZDZRD_PZsK/view?usp=sharing

---

Econ 5166 is an introductory course to “classical” machine learning (ML) methods (e.g., decision trees, LASSO, …etc.) with an emphasis on their applications in social science research. This course is most suitable for students who have finished their first statistics course, have some experience in data manipulation, and would like to explore the ideas behind machine learning.

While there are already many great ML courses offered at NTU, our course distinguishes itself by two features. First, we will emphasize the link between machine learning and statistics. ML models, as well as the techniques (e.g., cross-validation), will be taught through the lens of statistics. Specifically, we will also cover hypothesis testing in a data-rich environment, a topic that is often ignored in typical ML courses.

Second, we will examine how and why ML is used in social science research. For each method covered in class, we will go over in-depth a research paper that uses that method to understand why there is a use case for ML. We will also write our own code (both in class and in homework assignments) to replicate and extend these researches, helping you gather experience in handling real data and sharpen your practical skills in data analysis.

This course will cover four ML topics: PCA and clustering, empirical Bayes and large-scale inference, LASSO and ridge regression, and tree-based methods. At the beginning of this course, we will also have a high-level introductory topic on quantitative research in social science that partly serves as a review of statistics and Python data analysis.

課程目標

1. Be able to elaborate the use case of classical machine learning methods in both business and academic scenarios
2. Be able to conduct, organize, and deliver a data analysis project in a professional way

課程要求

1. Homework
2. Midterm
3. Final Project

預期每週課後學習時數

5 hours/ week

Office Hours

每週一 14:00~15:30

指定閱讀

參考書目

Gareth, J., Daniela, W., Trevor, H., & Robert, T. (2021). An introduction to statistical learning: with applications in R, 2nd Edition. Springer.

評量方式
(僅供參考)

No.	項目	百分比	說明
1.	Homework Assignment	40%	We will have on average one problem set every 3 weeks. Each problem set will have theoretical questions on statistical methods (30%), programming questions that ask you to implement some of the methods taught in class (40%), and data analysis questions using real data set (30%). Late submission will receive a 50% penalty if it is submitted within three days of the deadline. Any submission after three days WILL NOT be accepted without prior permission
2.	Midterm	20%	We will most likely have a midterm after we finish topic 2. The style of the midterm will match the style of the homework, except you will need to write code on paper. I will provide references for the functions needed during the midterm so that you don’t have to memorize the exact syntax.
3.	Final Project	40%	Assuming that we will come back to the classroom (rather than remote learning), your grade for the final project will consist of: - Proposal (5%) - Recorded rehearsal before the final presentation (5%) - In-class final presentation during (20%) - Poster session presentation (10%)

針對學生困難提供學生調整方式

上課形式	以錄影輔助
作業繳交方式
考試形式
其他

課程進度

週次	日期	單元主題
無資料